Super Scalar Sample Sort

نویسندگان

  • Peter Sanders
  • Sebastian Winkel
چکیده

Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory parallel computers. We show that sample sort is also useful on a single processor. The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions. This transformation facilitates optimizations like loop unrolling and software pipelining. The final implementation, albeit cache efficient, is limited by a linear number of memory accesses rather than the O(n log n) comparisons. On an Itanium 2 machine, we obtain a speedup of up to 2 over std::sort from the GCC STL library, which is known as one of the fastest available quicksort implementations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BlockQuicksort: Avoiding Branch Mispredictions in Quicksort

Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant ...

متن کامل

BlockQuicksort: How Branch Mispredictions don't affect Quicksort

Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant ...

متن کامل

Vector Sd-rom Filter for Removal of Impulse Noise from Color Images

One well-studied image processing task is the removal of impulse noise from images. Impulse noise can be introduced during image capture, during transmission, or during storage. The signal-dependent rank order mean (SD-ROM) filter has been shown to be effective at removing impulses from 2-D scalar-valued signals while preserving many details and other features. The algorithm is based on a state...

متن کامل

Super Scalar Processor using Chip Level Optical Interconnections

In this paper we present the design of a super-scalar processor constructed using optoelectronic components interconnected via high-speed free-space optical buses.

متن کامل

Super-Scalar Processor Design

A super-scalar processor is one that is capable of sustaining an instruction-execution rate of more than one instruction per clock cycle. Maintaining this execution rate is primarily a problem of scheduling processor resources (such as functional units) for high utilrzation. A number of scheduling algorithms have been published, with wide-ranging claims of performance over the single-instructio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004